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Abstract 

We introduce a new class of measurement matrices for compressed sensing, using low order sum- 
maries over binary sequences of a given length. We prove recovery guarantees for three reconstruction 
algorithms using the proposed measurements, including £i minimization and two combinatorial meth- 
ods. In particular, one of the algorithms recovers fc-sparse vectors of length N in sublinear time 
poly (fc log TV), and requires at most r2(fe log TV log log AT) measurements. The empirical oversam- 
pling constant of the algorithm is significantly better than existing sublinear recovery algorithms such 
as Chaining Pursuit and Sudocodes. In particular, for 10^ ^ N < 10^ and k — 100, the oversam- 
pling factor is between 3 to 8. We provide preliminary insight into how the proposed constructions, 
and the fast recovery scheme can be used in a number of practical applications such as market basket 
analysis, and real time compressed sensing implementation. 

1 Introduction 

Despite significant advances in the field of Compressed Sensing (CS), certain aspects of CS remain rela- 
tively immature. Thus far, CS has been viewed primarily as a data acquisition technique [1]. As a result, 
the applicability of CS to other computational applications has not enjoyed commensurate investigation. 
In addition, to the best of the authors' knowledge, there is no unified CS system that has been imple- 
mented for practical real-time applications. A few recent works have addressed the former by applying 
sparse reconstruction ideas to certain inference problems including learning and adaptive computational 
schemes ( e.g. [21 El Hj). Several other works have addressed the latter by designing hardware, which 
exploits the fact that CS enables the monitoring of a given bandwidth at a much lower sampling rate 
than traditional Nyquist-based methods (see e.g., [6J). The motivating factor behind these works is that 
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for a given maximum sampling rate (limited by the poor power consumption scaling with sampling rate) 
achievable by digitizing hardware, it is possible to either acquire signals over a much greater bandwidth, 
or with much less power for a given bandwidth. Recent work, inspired by this line of thought, has led to 
the development of hardware CS encoders (see e.g. O El [9l HOj). However, none of the previous works 
address the problem of real-time signal decoding, which is a critical requirement in many applications. 

Although variant by the nature of the problem and physical constraints, perhaps two fundamental 
issues in the practical implementations of CS are the following: 1) construction of measurement matri- 
ces that are provably good, certifiable and inexpensive to implement (either as real time sketches or as 
pre-built constructions), 2) Time efficient and robust recovery algorithms. Our aim is to introduce and 
provide an analysis of a sparse reconstruction system that addresses the aforementioned problems and 
allude to the extensions of CS in the less explored directions. 

We introduce a new class of measurement matrices for sparse recovery that are deterministic, struc- 
tured and highly scalable. The constructions are based on labeling the ambient state space with binary 
sequences of length n = log2 N, and summing up entries of x that share the same pattern (up to a fixed 
length) at various locations in their labeling sequences. The class of corresponding matrices are RIP-less 
matrices that are congruent with the Basis Pursuit algorithms, which are standard techniques for sparse 
reconstruction [11]. In addition, we provide two efficient combinatorial algorithms along with theoreti- 
cal guarantees for the proposed measurement structures. The proposed algorithms are sub- linear in the 
ambient dimension of the signal. In particular, we propose a summarized support index inference (SSII) 
algorithm with a running time of O {poly {k log N)) that requires 0(A;logiVloglog A^) measurements to 
recover A;-sparse vectors, and has a empirical required over-sampling factor significantly better than exist- 
ing sublinear methods. Due to the particular structure of the measurements and decoding algorithms, we 
believe that the proposed compression/decompression framework is amenable to real time CS implemen- 
tation, and off^ers significant simplification in the design of an existing CS encoder/decoder. Furthermore, 
observations collected based on the proposed constructions appear as low order statistics or "summaries" 
in a number of practical situations in which a similar intrinsic labeling of the state space exists. This 
includes certain inference and discrete optimization problems such as market basket (commodity bundle) 
analysis, advertising, online recommendation systems, genomic feature selection, social networks, etc. 

It should be acknowledged that there are various results on sublinear sparse recovery in the literature, 
including |12[ I13 | [T ^ I16j. Unlike most previous works, the constructions of this paper offer sublinear 
storage requirement and are compatible with the practical scenarios that we consider. The recovery time 
of the algorithm is sublinear in the signal dimension, and the empirical recovery bounds are significantly 
better than the existing sublinear algorithms, such as Chaining Pursuit and Sudocodes, especially for 
small and moderate sparsity levels and very large signal dimensions. 

2 Proposed Measurement Structures 

We define a class of structured binary measurement matrices, based on the following definition 
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Definition 1. Let m,n and d he integers. A {n,d) summary is a pair X = {S,c), where S is a subset 
of {1,2, ■■ ■ ,n} of size d, and c is a binary sequence of length d. A {m,n,d) summary codebook is a 
collection C = {{Si, Cj) | 1 < i < m, < j < 2^ - 1} of (n, d) summaries, where Si 's are distinct subsets, 
and Cj is the length d binary representation of the integer j. If m = (^), C is called the complete {n,d) 
summary codebook. 

To a given {m,n,d) summary codebook C, we associate a binary matrix A of size M x N where 
M = 2'^ X m, and N = 2", in the following way. For every {S, c) G C, there is a row a = (oi, . . . , a^) in 
A that satisfies: 

a, = l{b,(5) = c} l<j<N (1) 

where hj is the n-bit binary representation of j, and hj{S) is the subsequence of the binary sequence 
hj, indexed by the entries of the set In other words, a has a 1 in those columns £ whose binary 
labeling conform to {S,c). Every column of A has exactly m ones, and each row has exactly 2""'^ ones. 
To clarify this definition, we consider the following example illustrated in Figure [T| in which n = A and 
d = 2. Suppose that a summary [S, c) is given with S" = {1, 2} and c = 10. All possible binary sequences 
of length 4 that match [S, c) are listed in Figure [!} To find the corresponding indices of the listed labels, 
we should convert them to decimal values and increase by 1, which gives 9, 10, 11 and 12. The row a of 
a measurement matrix that includes this summary is a vector of length 2^ that has a 1 in those indices, 
as displayed. 

_:j.7^ix_x_ 

1000 ' 5={1.2},c=10 

1001 a=(ooooooooiiiiooooo) 
1010 

1011 

Figure 1: An example (4,2) summary and the corresponding row of the structured measurement matrix. 

The defined matrices are very well motivated by some practical problems. In general, in a situation 
where the given signal space retains an intrinsic structured labeling similar to the one described, such 
constructions prove very useful. In particular, we consider the following two motivational examples. 
Resource Optimization. Assume that a set T = {Fi,F2, ■ ■ ■ , F^} of features (or parameters) is avail- 
able, and assume that certain accumulations or collections of features form "lucrative" profiles (struc- 
tures). In particular, a lucrative profile can be a subset of features which is representable by a binary 
sequence b = 6162 ■ ■ - bn, where bi determines the presence of the z'th feature. A practical assumption is 
that lucrative profiles are limited and weighted, meaning that their profitabilities are variable. The vector 
X = {pi,P2, ■ ■ ■ ,P2")'^ formed by the respective profits of all feature collections is thus an approximately 
sparse vector. Furthermore, the available information about the profitability of profiles is often derived 
from a pool of observations or real world implementations, and are mostly given in the form of sum- 
^Note that these structured matrices can be defined for any finite alphabets other tlian the binary field. 
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maries. More formally, what can be learned is the average profitability of a certain configuration of only 
d features. For example, it can be assessed that when Fi and F2 are present and is absent, regardless 
of all other features, the average profit is some p. The collection of summaries form an observation vector 
y, that is related to x through a set of linear equations y = Ax, where A has a form similar to those 
obtained by summary codebooks. This setting arises in many practical applications such as market basket 
(commodity bundle) analysis, where the objective is to configure the structure of a market that complies 
the best with the needs and the behaviors of the customers. To that end, it is essential to understand 
which market configurations are winning and what packages of features (e.g. commodities, pricing op- 
tions, interest rates, etc.) should be offered to customers, and with what percentages . Furthermore, the 
customers' behavioral information is often given in terms of high level summaries, e.g. in the lines of the 
statement "people who buy A and B, are likely to buy C". 

Compressed Sensing Hardware. There are a few factors that severely limit the scalability of the 
existing CS hardware designs to larger problem dimensions. One of these factors is the generation of 
the measurement matrix A. In the simplest existing design, A is typically a pseudo-random matrix 
generated with a linear feedback shift register (LFSR) [H [T0|. The timing synchronization of a large 
number of measurements as well as the planar nature of physical implementations is very limiting. Using 
a more structured matrix may allow considerable simplification and reduction of the required hardware 
easing some of the previously mentioned limitations. The measurement structure defined in this work is 
potentially highly amenable to the implementation of practical CS hardware, due to the following two 
reasons. 1) There exist simple sublinear recovery algorithms for the proposed matrices, other than the 
linear programming method. This will be elaborated in the proceeding sections. 2) Due to the highly 
structured design, the integration matrix A can be implemented using one single LFSR seed, and a num- 
ber of asynchronous digital circuits. Due to the lack of space and the irrelevance of the context, we avoid 
a detailed description of the latter, and postpone this to a future work. 

3 Proposed Recovery Algorithms 

For the measurement matrices described in the previous section we propose three reconstruction algo- 
rithms and provide success guarantees. These algorithms include the Basis Pursuit algorithm (a.k.a ii 
minimization), as well as two fast algorithms that can recover sparse vectors from a sublinear number of 
measurements and in a sublinear amount of time. The detailed specifications will be given in the sequel. 
For the sake of the theoretical arguments that appear in the remainder of this section, we need to define 
the following notions: 

Definition 2. Let n and I he integers with I < n. We define fsin, I), fwi'iT', ^) ^"i^d /{yin, l,p, e) to be 
the largest integer k such that when k binary sequences of length n are selected at random, the following 
happens respectively: 

1. With probability 1, there exists a {n,d) summary that appears in exactly one of the sequences. 
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2. With probability at least p, for each of the binary sequences, at least a fraction e of its (n, d) 
summaries are unique. 

3. With probability at least p, for each of the binary sequences, at least a fraction e of its (n, d) 
summaries that include the first bit are unique. 

It is important to note that the recovery guarantees of the presented combinatorial algorithms are 
only valid for a class of vectors in which no two disjoint subsets of nonzero coefficients have the exact 
same sum. For simplicity, we refer to these vectors as "distinguishable" signals. This is not the case for 
Basis Pursuit. 

3.1 Basis Pursuit 

The success of the basis pursuit algorithm for recovering sparse signals is certified by several conditions. 
Two major classes of conditions are the Restricted Isometry Property (RIP) and the null space prop- 
erty imlS]. It is provable that the measurement structures defined in this paper do not maintain the RIP 
properties, due to the existence of columns with fairly large coherence. This however does not discard 
the suitability of these constructions for £i minimization, since RIP is known to provide a sufficient con- 
dition (see e.g., [E]). Instead, we prove that certain null space conditions hold for the considered class 
of matrices, and therefore provide a sparse signal recovery bound for li minimization. We restrict our 
attention to nonnegative vectors in this case. The reconstruction method is the following program with 
the additional nonnegativity constraint. 



The performance of the above program was studied for 0-1 matrices in p^. In particular, it was shown 
that a nonnegative vector x can be recovered from ^ , if and only if it is the unique nonnegative solution 
of the linear system of equations, which is stated formally in the following lemma. 

Lemma 3.1 (from [H]). Suppose A G is a matrix with constant column sum, and xq € W^^^ is a 

nonnegative vector, xq is the unique solution to if and only if xq is the unique nonnegative solution 
to = j4xo. 

Using the above lemma, we can evaluate the performance of the Basis Pursuit algorithm when used 
with the presented measurement matrices. The following theorem is fundamental to this analysis. 

Theorem 3.1 (Strong Recovery for Basis Pursuit). Let k < fs{n,d — 1) be an integer, and let A 
correspond to a complete (n, d) summary codebook. Then every k-sparse nonnegative vector x is perfectly 
recovered by 

Proof. Let k < fs{n,d— 1) and let xq be a nonnegative /c-sparse vector. Also, let the n-bit binary labels 
associated to the support set of xq be bi, b2, . . . , b^. We show that if A corresponds to a complete (n, d) 



minimize 



x||i 




subject to ^x = y, X 
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summary codebook, then xq is the unique nonnegative solution to Ax = Axq. Therefore, by Lemma 3.1 
it follows that xq can be recovered via ([2]). We prove this by contradiction. Suppose that there is another 
nonnegative vector x 7^ xq with Ax = Axq. Due to the nonnegativity assumption, we may assume that 
the support sets of x and xq do not overlap. Let the n-bit labels of the support set of x be the binary 
sequences b'^^jbg, . . . ,b^. From the definition of fsi'), we can assert that there is a {n,d— 1) summary 
that appears in exactly one of the sequences bi, . . . , b^. Let us assume without the loss of generality that 
the first d — 1 bits of bi are unique, and that bi is the all zero binary sequence. Therefore, there are at 
least n — d + 1 measurements in y = Axq that are equal to the entry of xq that corresponds to the label 
bi. These measurements are those that correspond to the summaries 

({l,2,...,d- l,i},0), d<i<n (3) 

Since, Ax = Axq, there must be a nonzero entries in x with labeling indices that satisfy the above 
summaries. In particular, without loss of generality assume that the first d bits of b'^ are all zero. 
However, since the support sets of x and xq do not overlap, b'^^ is different from bi in at least one bit, 
say bi(j) / b'^(j) for some j > d. Now consider the summary (5, c) = {{1,2, . . . ,d — 1, j}, 00 . . . 01), 
which represent the set of all binary sequences that are zero on the first d — 1 bits and one on the jth 
bit. Because A corresponds a complete {n,d) codebook, there is a row of A that is based on (5*, c), and 
moreover the corresponding value of y is nonzero, because b'^^ conforms to {S,c). On the other hand 
this cannot be true when considering the equations y = Ax, because it requires that one of the labels 
bi,...,bfc conform to {S,c), which cannot be bi (recall that bi is the all zero codeword, whereas c 
includes a 1). The existence of such a label contradicts the assumption that bi is the only label whose 
d — 1 first bits are all zero. 



The complexity of Basis Pursuit is generally polynomial in the ambient dimension of the signal. 
Specifically, one can implement ([2]) in 0{N^) operations, without exploiting any of the available structural 
information of the measurement matrix. Although there are some advantages to Basis Pursuit, such as 
robustness to noise, its complexity is impractical for problems where N scales exponentially. In these 
situations, sublinear time algorithms are preferred. 

3.2 Summarized Support Index Inference 

The first sublinear algorithm discussed in this subsection is called the summarized support index inference 
(SSII). The algorithm is based on iteratively inferring the nonzero entries of the signal based on one of 
the distinct values of y and its various occurrences. The method is described below. 

At the beginning of the algorithm, distinct nonzero values of the observations y are identified, and 
are separated from the zero values. Due to the distinguishability assumption on x, each distinct nonzero 
value of y is a sum of a unique subset of nonzeros of x, and can thus be used to infer the position 
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Algorithm 1 SSII 

1: Repeat until all nonzeros of x are identified. 

2: Identify distinct nonzeros of y, exhaust the following: 

3: Consider all occurrences of a value yn{i) = • • • = yTr{t)- 

4: Construct a binary sequence b by setting b(5^(j)) := c^(j), \/l < i < t, where {Sj, Cj) is the summary 

corresponding to measurement yj. 
5: If b is fully characterized without confliction from previous step, then a nonzero entry of x has been 

determined, subtract it, update y and go to step 2. Otherwise, exhaust the following step. 
6: Find a subset S' , such that among summaries (5",c) that do not contradict with b, exactly one 

corresponds to a nonzero of y, say (S",c'), and set b(c') := S' . 



of at least one nonzero entry. The index of a nonzero entry of x is determined by its unique labeling, 
which is a binary sequence of length n. Therefore, the algorithm attempts to infer all relevant binary 
sequences. Suppose that a nonzero value of y is chosen that has t occurrences, say without loss of 
generality, yi = 1/2 = ' ' ' = Vt- Also, let the {n,d) summary which corresponds to the ith row of A 
be denoted by (5i,Cj) (see equation ([T])). The algorithm explores the possibility that yi,y2, . ■ ■ ,yt are 
all equal to a single nonzero entry of x, by trying to build a binary sequence b that conforms to the 
summaries Cj)} -^-i^, i.e., by setting: 

b(5i) := Ci, VI < i < t (4) 

If there is a conflict in the set of equations in Q, then that value of y is discarded in the current iteration, 
and the search is continued for other values. Otherwise, two events may occur. If Q uniquely identifies 
b, then one nonzero position and value of x has been determined. It is subtracted, measurements are 
updated and the algorithm is continued. However, there might be a case where only ni < n bits of b 
are determined by Q. In this case, we use the zero values of y to infer the remaining n — ni bits in 
the following way. Let the set of known and unknown bits of b be denoted by Si and 52, respectively. 
We consider the summaries {S, c) which contribute to A, and among all, consider all distinct subsets S. 
If there is a subset S' such that among all the measurements corresponding to (5", c) where c does not 
conflict with b(S"), exactly one of them are nonzero, say (S",c'), then the bits of b over S' D S2 can 
be uniquely determined by setting b(S") = c' . This procedure is repeated until either b is completely 
identified, or all possibilities are exhausted. A high level description of the presented method is given in 
Alg. [T| for which we can assert the following weak and strong recovery guarantees. 

Theorem 3.2 (Strong Recovery for SSII). Let k < fs{n,d— 1) be an integer, and let A correspond to a 
complete (n, d) summary codebook. Then every k-sparse distinguishable vector x is perfectly recovered by 
Alg.\^ 

Proof. Let k < fs{n,d — 1) and let x be a /c-sparse vector. Also, let the n-bit binary labels associated 
to the support set of x be bi, b2, . . . , b^. We show that at least one of these labels can be inferred from 
one of the nonzero values of the vector y = j4x, by solving Q. From the definition, there is a {n,d — 1) 
summary that appears in exactly one of the labels bi, b2, . . . , b^. Without loss of generality, let's assume 
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that the first d—1 bits of bi are unique, and that bi is the all zero binary sequence. Also, let the nonzero 
value of X in the position given by bi be 7. Now consider all summaries {S, c) for which the value of the 
corresponding entry in y is equal to 7. Let these summaries be denoted by {(Sj, Cj)}*^]^, where t is the 
number of occurrences of 7 in y. We show that there is a unique binary sequence b' that conforms to all 
of these summaries. In other words, we prove that equation Q has a unique solution which is equal to 
b' = bi. 

Due to the distinguishability assumption on the nonzero values of x. The set {(S'i, Cj)} -^^^ should 
include the following summaries: 

({l,2,...,d- l,i},0), d<i<n (5) 

Where indicates the all zero bit sequence of length d. Clearly the only length n binary sequence that 
conforms to all of the above summaries is the all zero binary sequence, namely bi. Thus, we only need 
to show that bi(5'j) = Cj for all other summaries (5i,Cj), 1 <i <t. This also follows immediately from 
the distinguishability assumption on x, and the fact that every instance of 7 in the vector y is only the 
result of the nonzero value in x labeled by bi (i.e. it is not the direct sum of another subset of the entries 
ofx). ■ 

Theorem 3.3 (Weak Recovery for SSII). Let k < f^{n,d,p,€) be an integer, and let A correspond to 
a random (n, m, d) summary codebook. Then, a random k-sparse distinguishable vector x is recovered by 
Alg. [i] with probability at least 1 — kn [l — p + p{l — ^)™) . 

Proof. We define an event £ which is stronger that the success event of Algorithm [T| namely a sufficient 
condition for the success of SSII. Let the n-bit binary labels associated to the support set of x be 
bi, b2, . . . , bfe, and let C be the {n,m,d) summary codebook based on which A is constructed. The 
sufficient condition for success of SSII is that for every 1 < i < k, and every bit 1 < j < ?^, there exists 
a summary (S, c) in C such that j £ S and in addition bj(S') = c and hi ^ c \/i ^ i. In other words, for 
each of the k labels corresponding to the support of x and each of the n bits, there is a summary in the 
codebook that includes the considered bit and only conforms to that particular label. 

We find a lower bound on the probability of the complementary event £^ by using union bounds. Note 
that there are m distinct subsets in the summaries of the codebook C, which are chosen randomly. We 
assume that the subsets are chosen independently at random, and allow repetition. In case of repetition, 
the repeated subset is excluded, which only makes things worst. Consider a label bi and the first bit. 
The probability that a randomly chosen subset of bits of length d includes the first bit is ^. Furthermore, 
let us say that at least a fraction e' of the summaries that conform to bi and include the first bit, does 
not conform to the remaining bj's (i.e. only appear in bi). Then, when a random subset S is chosen, 
with probability at least the following happens: 

1 e S and bi(5) / bi(5) Vl< i < (6) 
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Therefore, the probabihty that the above event does not happen for any of m randomly chosen subsets 
S is at most (1 — ^)"^- From the definition of /{y(-) and the fact that k < /(^/(n, d,p, e), we know that 
with probabihty at least p, e' > e, and therefore, the probability that ^ does not happen for any set S 
in the codebook C is at most 1 — p + p{l — ^)™'. If we union bound the probability of such event for all 
possible k labels and all possible n bits, we conclude that the probability of the undesirable event £^ is 
bounded by: 

F{£^) < nk{l - p + p{l - —D (7) 

n 

Which concludes the proof of the theorem. ■ 

The explicit recovery bounds given by above theorem are calculated in Section [4} Alg. [T]can be imple- 
mented very efficiently, with 0(max(poly(M), A; log A^)) operations, which is sublinear in the dimension 
of the problem. The computational advantage is owed to the most part to the structural definition of the 
measurement matrices which facilitates sublinear search over the column space of the matrix. In addition, 
we do not require an exponential memory for decoding, since the information about A and the current 
inferred indices of the unknown vector at each stage can be retained by only storing the corresponding 
binary indices. 

3.3 Mix and Match Algorithm 

We describe a third recovery method, which is on the lines of the algorithm proposed in [2] with slight 
modifications. The algorithm consists of two subroutines: a value identification phase in which the 
nonzero values of the unknown signal is determined, and a second phase for identifying the support set of 
X. The method is based on measurements given by y = (y'-"'^^^, y^^^"^)"^ = (^f ,^^)'^x, where only y^^^ is 
used for the first phase, and y*-^^ and A2 are used in the second phase. For details of this method please 
refer to pj. We analyze this algorithm for the proposed measurement structures of this paper, which is 
different from the analysis of 

Algorithm 2 M&M 

Find the set Y of nonzero entries of yi and set X = 0. X will determine the set of nonzeros of x. 

Repeat steps 2,3 until S{X) = Y. 

Update the set S(X) of sums of subsets of X. 

Find the smallest entry of Y that is not in S{X), and add it to X. 

Initiate zero binary sequences {b^^lx G X}, which will determine the labeling of the support indices 
of X. 

For every nonzero entry of y2, find the corresponding summary (5, c), and a subset X' C X that sum 
up to that value of y2- Set b^(5) = c, Vx S X' . 

Theorem 3.4 (Weak Recovery for M&M). Let k < /vi/(^t., d,p, e) be an integer, and A = (^AjA^)'^ where 
Ai and A2 correspond to a random {m,n,d) summary codebook, and a complete (n, 1) summary codebook, 
respectively. Then, a random nonnegative k-sparse distinguishable vector x is recovered by Alg. with 
probability at least p{l — k{l — e)"^). 
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Proof. Let the n-bit binary labels associated to the support set of x be bi,b2, . . . ,bfc, and let Ci be 
the (n, m, d) summary codebook based on which Ai is constructed. It can be shown that the value 
identification subroutine of Alg. [2] identifies all nonzero values of the nonnegative vector x correctly, if in 
the observation vector y, all every nonzero value of x appear at least once. We find the probability that 
this condition holds, when the m subsets of the random coodbook Ci are chosen at random. For every 
1 < i < A;, we define the following set of subsets of {1, 2, . . . , n}: 

Ui{S I \S\ = d, hj{S) + hi{S) Vj / i} (8) 

If a subset S in the codebook C\ belongs to ZYj, then the nonzero entry 7j that corresponds to the label 
bj appears in the observation vector y. Therefore, we are interested in finding the probability that the 
set of m subsets of C\ has a nonempty overlap with all Z^j's. Let us assume that for some e' > 0, the 
following holds: 



l«.l>.'Q.vi 



<i<k (9) 



When a subset 5 is chosen at random, the probability that it belongs to Hi is at least e'. Therefore the 
probability that lAi does not overlap with the set of all subsets S appearing in C\ is at most (1 — e')™". 
Using a union bound over all 1 < i < /c, we conclude that the probability that this undesirable event 
happens for at least one of the sets Hi is at most k{\ — e')*", which means that the probability of success is 
at least 1 — — e')™. However, we know from the definition of /vy(-); a-nd the fact that k < fw{n, d,p, e), 
that with probability at least p, we have e' > e. Therefore the overall probability of success is at least: 

l-p + p{l-k{l- eD > p{l - k{l - e)"") (10) 



The complexity of Alg. [2] is C'(max(poly(M), 2^^)), and thus explodes when k grows. 

4 Recovery Bounds 

We derive recovery bounds for ([2]) and Alg.'s [T] and [2] by obtaining explicit bounds on the terms of 
definition [2] and replacing them in the recovery guarantees of Section [Sj namely Theorems 3.1 3.4 The 



proof of the following lemma is based on some combinatorial techniques and Chernoff concentration 
bounds. 



Lemma 4.1. Let n,l and k be integers and < a < 1/2. Also, let e = 1 — k^'^^^'^^ ^°^)/(")' ^^'^ 

1. fs{n,l)>2K 

2. fw{n,l,p,e) > k. 
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3- f{y{n,l,p,e)> fw(n- 1,1-1, p,e). 



By exploiting the expressions of the above lemma in Theorems 3.1 3.4 , we obtain the following bounds 
for different methods: 

Basis Pursuit. If a complete {n,d) summary codebook is used to build A, then the number of mea- 
surements is M = 2'^(^), and every sparsity k < 2^^^ is guaranteed to be recovered. When put together 
(recall that n = log2A^), an upper bound on the the required number of measurements for recoverable 
sparsity k is given by: /, 

<") 

In particular, for small values of k, the above bound is comparable with the M = 2A;log bound of l\ 
minimization for random Gaussian matrices |19j . 



SSII Algorithm. We focus on the weak bound, namely the one obtained from Theorem 3.3 The general 
strategy is to take the values of p and e according to Lemma |4.1| with I = d — 1, and choose k and m 
in such a way that firstly, e is bounded away from zero, and secondly, the probability of recovery failure 
approaches zero as n — )• oo. Taking k = A2~'^'°S2(\/q72+i/2) fQj. ggme < A < 1, a few basic algebraic 
steps lead to the following: 

P (failure) < fc^ne""" + A;n (1 - (1 - X)d/n)"' , (12) 

It follows that the above expression approaches zero if m = ^(nlogn). Furthermore, a can be chosen ar- 
bitrarily close to zero. Therefore, it follows that an upper bound on the required number of measurements 
for successful recovery with high probability is given by: 

M = J](A;logiVloglogiV). (13) 



M&M Algorithm. We take k = A2"'^^°S2{V°/2+i/2)^ a,nd e,p according to Lemma |4.l[ it follows that: 

P (failure) < k^e'"''' + fcA™, (14) 
Which asymptotically vanishes if m = Q.{\ogk). Recall that the number of measurements in this case is 



determined by the matrix A = -^^^ described in Theorem 3.4 which is equal to M = 2 log N+mx2'^. 
Therefore, it follows that an upper bound on the required number of measurements for successful recovery 
with high probability is given by: 

M = 2logN + n(klogk). (15) 

In particular, when k = o(loglogA^), this means only 0(logN) measurements are required, and the 
running time of the algorithm is O(logA^) (see Section [s]), both of which are almost optimal. 

5 Simulations 

Since Alg. [2] is only efficient for very small values of k, we present the empirical performance of Alg. 
[T] Due to the efficiency of the method, it is possible to perform simulations for very large values of 
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N. In Figure [2| the empirical required over-sampling rate for Alg. [2] and the proposed constructions is 
plotted versus the signal dimension A^, for various sparsity levels k. The required criteria here is that 
the probability of successful recovery be larger than 90%. Note that when is increased by 3 orders of 
magnitude, the required number of measurements is increased by a factor of 3, which is an indication of 
the logarithmic dependence of M to N. Furthermore, as the signal becomes less sparse (i.e. k increases), 
the required oversampling factor decreases. For k = 100, this ratio is only about 3 for N = 1024, and 
about 8 for A^ = 3.3 x 10^. This is significantly better than existing sublinear recovery algorithms. Note 
that the optimal value of d for constructing the measurement matrices for every k,N is found empirically. 
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Figure 2: Required oversampling rate for successful recovery of Alg. ^on proposed constructions versus signal 
sparsity levels. 
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In Figure|3j the probability of successful recovery is plotted against the sparsity level for A = 32768, 
and M = 140 and 240. We can see that although the number of measurements has only increased by a 
factor of 1.7, the recoverable sparsity (given a fixed probability of success) has improved in some cases by 
a factor of 5. These curves are comparable with the performance of ii minimization over dense matrices, 
with A^ = 900, as displayed, which is an indication of the strong performance of the proposed scheme. 
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Figure 3: Probability of successful recovery of Alg. Ml versus sparsity level k, for N = 32768 and M = 140, 240, and the same curves 
for £i-minimization over i.i.d Gaussian matrices with iV = 900. 
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